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Abstract: This paper studies estimation in functional linear quantile re- 
gression in which the dependent variable is scalar while the covariate is a 
function, and the conditional quantile for each fixed quantile index is mod- 
eled as a linear functional of the covariate. Here, we suppose that covari- 
ates are discretely observed and sampling points may differ across subjects, 
where the number of measurements per subject increases as the sample size. 
Also, we allow the quantile index to vary over a given subset of the open unit 
interval, so the slope function is a function of two variables: (typically) time 
and quantile index. Likewise, the conditional quantile function is a function 
of the quantile index and the covariate. We consider an estimator for the 
slope function based on the principal component basis. An estimator for 
the conditional quantile function is obtained by a plug-in method. Since the 
so-constructed plug-in estimator not necessarily satisfies the monotonicity 
constraint with respect to the quantile index, we also consider a class of 
monotonized estimators for the conditional quantile function. We establish 
rates of convergence for these estimators under suitable norms, showing 
that these rates are optimal in a minimax sense under some smoothness as- 
sumptions on the covariance kernel of the covariate and the slope function. 
Empirical choice of the cut-off level is studied by using simulations. 
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1. Introduction 

Quantile regression, initially developed by the seminal work of [29] , is one of the 
most important statistical methods in measuring the impact of covariates on 
dependent variables. An attractive feature of quantile regression is that it allows 
us to make inference on the entire conditional distribution by estimating several 
different conditional quantiles. Some basic materials on quantile regression and 
its applications are summarized in [28]. 

This paper studies estimation in functional linear quantile regression in which 
the dependent variable is scalar while the covariate is a function, and the con- 
ditional quantile for a fixed quantile index is modeled as a linear functional of 
the covariate. The model that we consider is an extension of functional linear 
regression to the quantile regression case. Here, we suppose that covariates are 
discretely observed and sampling points may differ across subjects, where the 
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number of measurements per subject increases as the sample size. Also, we allow 
the quantile index to vary over a given subset of the open unit interval, so the 
slope function is a function of two variables: (typically) time and quantile index. 
Likewise, the conditional quantile function is a function of the quantile index 
and the covariate. We consider the problem of estimating the slope function as 
well as the conditional quantile function itself. The estimator we consider for the 
slope function is based on the principal component analysis (PC A). Expanding 
the covariate and the slope function in terms of the PCA basis, the model is 
transformed into a quantile regression model with an infinite number of regres- 
sors. Truncating the infinite sum by the first m (say) terms, we may apply a 
standard quantile regression technique to estimating the first m coefficients in 
the basis expansion of the slope function at each quantile index, where m di- 
verges as the sample size. In practice, the population PCA basis is unknown, 
so it is replaced by a suitable estimator for it. Once the estimator for the slope 
function is available, an estimator for the conditional quantile is obtained by 
a plug-in method. Since the so-constructed plug-in estimator not necessarily 
satisfies the monotonicity constraint with respect to the quantile index, we also 
consider a class of monotonized estimators for the conditional quantile function. 
In summary, we have the following three types of estimators in mind: 

(i) a PCA-based estimator for the slope function; 

(ii) a plug-in estimator for the conditional quantile function; 

(iii) monotonized estimators for the conditional quantile function. 

We establish rates of convergence for these estimators under suitable norms, 
showing that these rates are optimal in a minimax sense under some smoothness 
assumptions on the covariance kernel of the covariate and the slope function. 

In practice, we have to choose the cut-off level empirically. We suggest some 
criteria, namely (integrated-)AIC, BIC and GACV to choose the cut-off level. 
We study the performance of these criteria by using simulations. In our limited 
simulation experiments, although none of these criteria clearly dominated the 
others, (integrated-)BIC worked relatively stably. 

Functional data have become increasingly important. We refer the reader to 
[34] for a comprehensive treatment on functional data analysis. Earlier theo- 
retical studies in functional data analysis have focused mainly on functional 
linear mean regression models [see 9, 10, 37, 6, 22, 14, 27, 39, 15, and refer- 
ences cited in these papers]. Among them, [22] established fundamental results 
in functional linear mean regression, deriving sharp rates of convergence for a 
PCA-based estimator for the slope function under some smoothness assump- 
tions. Note that they assumed that covariates are continuously observed. Other 
than functional linear mean regression, [33] developed estimation methods for 
generalized functional linear models using series expansions of covariates and 
slope functions. 

While not many, there are some earlier papers on estimating conditional 
quantiles with function-valued covariates. [7] studied smoothing splines estima- 
tors for functional linear quantile regression models, while their established rates 
are not sharp. [18] considered nonparametric estimation of conditional quantiles 
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when covariatcs arc functions, which is a different topic than ours. [11] consid- 
ered an "indirect" estimation of conditional quantiles. They modeled the condi- 
tional distribution as the composition of some (possibly unknown) link function 
and a linear functional of the covariate. They first estimated the conditional 
distribution function by adapting the method developed in [33] and then esti- 
mated the conditional quantile function by inverting the estimated conditional 
distribution function. In the quantile regression literature, there are two ways 
to estimate conditional quantiles. One is to directly model conditional quantiles 
and estimate unknown parameters by minimizing check functions. The other is 
to estimate conditional distribution functions and invert them to estimate con- 
ditional quantile functions. We refer to the former as a "direct" method while to 
the latter as an "indirect" method. The approach taken in this paper is classi- 
fied into a direct method, while that of [11] is classified into an indirect method. 
Note that although their method is flexible, they only established consistency 
of the estimator. 

Conditional quantile estimation offers a variety of fruitful applications for 
data containing function-valued covariates. A leading example in which condi- 
tional quantile estimation with function-valued covariates is useful appears in 
analysis of growth data [11]. Suppose that we have a growth data set of girls' 
heights between age 1 and 18, say, where multiple measurements may occur 
at some ages. Use girl's growth history between age 1 and 12 as a covariate, 
and her height at age 18 as a response. Then, conditional quantile estimation 
gives us an overall picture of the predictive distribution of girl's height at age 
18 given her growth history between age 1 and 12, which is more informative 
than just knowing the mean response. In addition to growth data, functional 
quantile regression has been applied in analysis of ozone pollution data [8] and 
EL Nino data [18]. We believe that functional linear quantile regression model- 
ing is a benchmark modeling in conditional quantile estimation when covariates 
are functions, just as linear quantile regression modeling is so when covariatcs 
are vectors. 

Our estimator for the slope function (at a fixed quantile index) can be under- 
stood as a regularized solution to an empirical version of a nonlinear ill-posed 
inverse problem that corresponds to the "normal equation" in the quantile re- 
gression case, where the regularization is controlled by the cut-off level. The 
paper is thus in part related to the literature on statistical nonlinear inverse 
problems, which is still an ongoing research area [see 4, 26, 31, 12, 19]. On the 
other hand, in the mean regression case, the normal equation becomes an lin- 
ear ill-posed inverse problem. [22] considered two regularized estimators for the 
slope function based on the normal equation in the mean regression case. Con- 
ceptually, the problems handled in our and their papers are different in their 
nature: linearity and nonlinearity. 

From a technical point of view, establishing sharp rates of convergence for 
our estimators is challenging. Our proof strategy builds upon the techniques 
developed in the asymptotic analysis for M-estimators with diverging numbers 
of parameters [see, for example, 23]. However, the additional complication arises 
essentially because "regressors" here are estimated ones and the estimation error 
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has to be taken into aecount, whieh requires some new techniques. Additionally, 
discretization errors bring a further technical complication. 

Finally, the setting here is similar to Section 3 of [14]: covariates are densely 
but discretely observed, and the discretization error is taken into account in the 
analysis. However, the paper does not cover the case in which covariates are dis- 
cretely observed with measurement errors because of the technical complication. 
A formal theoretical analysis in such a case is left in a future work. 

The remainder of the paper is organized as follows. Section 2 presents the 
model and estimators. Section 3 gives the main results in which we derive rates 
of convergence for the estimators. Section 4 discusses empirical choice of the 
cut-off level. Proofs of the main results are given in Sections 5 and 6. Some 
technical results are provided in Appendix. 

Notation: For z S K'^, let ||2:||^2 denote the Euclidean norm of z. For any 
integer k>2, let S*"-^ denote the set of all unit vectors in R'': S''^^ = {z € Rp : 
||z||£2 = 1}. For any y, z S R, let yVz = max{y, z} and yAz = min{y, z}. Let l(-) 
denote the indicator function. For any given (random or non-random, scalar or 
vector) sequence {zi}"^;^, E„[zi] = n^^ ^jj which should be distinguished 

from the population expectation E[-]. For any two sequences of positive constants 
r„ and s„, we write r„ x s„ if the ratio r„/s„ is bounded and bounded away from 
zero. Let L2 [0, 1] denote the usual L2 space with respect to the Lebesgue measure 
for functions defined on [0, 1]. Let || • || denote the L2-norm: = Jq p{t)dt. 
. For any finite set /, Card(/) denotes the cardinality of /. 

2. Methodology 

2.1. Functional linear quantile regression modeling 

Let (Y, X) be a pair of a scalar random variable Y and a random function 
X = {X{t))t£T on a bounded closed interval T in R. Without loss of generality, 
we assume T = [0, 1]. By "random function", we mean that X{t) is a random 
variable for each t £ [0, 1]. We assume a mild regularity condition on the path 
property of X. Let D[0, 1] denote the space of all cadlag functions on [0,1], 
equipped with the Skorohod metric [see 3]. We assume that the map t i-> X{t) 
is cadlag almost surely. Equip £'[0,1] with the Borel cr-field. Then, X can be 
taken as a D[0, l]-valued random variable. Since £'[0, 1] is a Polish space, and 
the product space R x Z?[0, 1] with the product metric is also Polish, the regular 
conditional distribution of Y given X exists. 

Let Qy\x{' I X) denote the conditional quantile function of Y given X. Let 
U he a given subset of (0, 1) that is away from and 1, i.e., for some small 
e G (0, 1/2), U C [e, 1 — e]. For each u € U, wc assume that Qy\x{u \ X) can 
be written as a linear functional of X, i.e., for each u G U, there exist a scalar 
constant a{u) E R and a scalar function h{-,u) E £2(0, 1] such that 
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where X^it) = X{t) - E[X{t)]. Typical examples oiU C (0, 1) are: (i) U = {u} 
(singleton); (ii) U = {m, . . . ,uk} with < ui < • • • < uk < 1 (finite set); (iii) 
U = [ul , uu] with < ul < uij < 1 (bounded closed interval) . Formally, we 
allow for all these possibilities. 

The model (1) is a natural extension of standard linear quantile regression 
models to function- valued covariates, and was first formulated in [7]. In what 
follows, we consider to estimate the slope function (t, u) i— )■ b{t, u) and the 
conditional quantile function (u,x) n- Qy\x(u I well. 



2.2. Estimation strategy 



Wc base our estimation strategy on the principal component analysis (PCA). 
Define the covariance kernel K{s,t) = Cov{X{s), X{t)). Then, by the Hilbert- 
Schmidt theorem, K{s, t) admits the spectral expansion 

oo 

K{s,t) = ^ Kj(f)j{s)(f)j{t), Kl > K2 > ■ ■ ■ > 0, 

] = i 

where {4>j}'j^i is an orthonormal basis for L2[Q, 1]. We will later assume that 
there are no ties in Kj, i.e., ki > K2 > • ■ • > 0. Since {4>j}'jLi is an orthonormal 
basis for L2[0, 1], we have the following expansions in L2[0, 1]: 

oo oo 

where and bj{u) are defined by 

= / X'{t)(^j{t)dt, hj{u)^ f h{t,u)(^j{t)dt. 
Jo Jo 

Here, £,j are called "principal scores" for X. The expansion for X'^ is called 
"Karhunen-Loeve expansion" . This leads to the expression /„ b{t, u)X''{t)dt = 
Yl'jLi ■ Then, the model (1) is transformed into a quantile regression 

model with an infinite number of "regressors" : 

oo 

Qrixiu I X) = a{u) + E^j(")0: ^eU. (2) 

Note that E[Cj] = 0,E[^2] ^ E[Cj^/c] = for all j ^ k. 

We first consider to estimate the slope function {t,u) i->- b{t,u). To this end, 
we estimate the function b{-,u) for each u G U and collect them to construct 
a final estimator for (t,u) i— > b{t,u). To explain the basic idea, suppose for a 
while that (i) X were continuously observable; and (ii) the covariance kernel 
K{s,t) were known. The problem then reduces to finding suitable estimates of 
the coefficients bj{u). Let (Fi, Xi), . . . , (y„, X„) be independent copies of {Y, X). 
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For each i — 1, . . . , n, let be the principal scores for Xi. Pick any u £ Li. 
Then, a plausible approach to estimating b{-,u) is to truncate 
by X^jLi ^'^^ some large m, and estimate only the first m coefiicients 

bi{u), . . . , h„i{u) using a standard quantile regression technique. Let m = m„ be 
the "cut-off" level such that 1 < to < n — 1 and m — > oo as 7i — > oo. Estimate 
a(u) and the first to coefficients bi{u), . . . , b„i{u) of u) by 

(a(u), 6i(u),..., 6™ (w)) = arg min E„[p„(yi - a - X^^i^i^i)], (3) 

a,bi,...,bjn ^ 

where Pu{y) ^ {u — l{y < 0)}y is the check function. Note that for u = 0.5, 
Po.5{-) is equivalent to the absolute value function. Here, recall that E„[zi] = 
n""'^ ^"^j^ for any sequence {zi}"^]^. The resulting estimator for the slope 
function (t, it) i— )■ b(t,u) is given by 

m 

b : (t,u) ^-^b{t,u), b{t,u) ^'^bj{u)(t)j{t), t e [0,l],u eU. 

i=i 

However, this "estimator" is infeasible since (i) X is usually discretely ob- 
served; and (ii) K{s,t) is unknown. Because of (i), it is usually not possible 
to directly estimate the covariance kernel K{s,t) by the empirical one (since 
En[{Xi{s)-X{s)){X^{t)-X{t))] with X{t) = n"! J27=i Mt) is unavailable for 
some {s,t)). Similarly to [14], we consider the following setting: 

1. For each i = 1, . . . ,ti, Xi is only observed at Li + 1 discrete points = ta < 
ti2 < ■ ■■ < U.L,+i = 1- Typically, maxi<i<„ maxi<;<L^ (t^j+i - ta) 
as ri — oo is assumed. 

2. Based on the discrete observations, for each i ~ 1, . . . , ri, we construct an 
interpolated function Xi = {X,{t))t(z[o^i] for Xi = (Xi(t))tg[o,i] . 

Here, we shall use a simple interpolation rule (see also the later remark): 

Li 

X,(f) = ^X(t,Ol(i e [ta,k,i+i)), z = l,...,n. 
1=1 

The observed time points t^i, . . . , ti^^.+i (and Li) should be indexed by the 
sample size n; however it is suppressed for the notational convenience. Suppose 
now that the interpolated functions Xi, . . . , Xn are obtained. Then, we may 
estimate the covariance kernel K{s, t) by 

k{s,t) = E„[(X,(s) - X{s)){Mt) - X{t))l 

where X{t) = n^^ J27=i Let K{s,t) ~ Sjli '^j4'j{s)4'j{t) be the spectral 

expansion of K(s,t) where ki > K2 > • ■ ■ > and {t/>j}°^x is an orthonormal 
basis for L2[0, 1]. Each principal score Cij is estimated by 

Jo 
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Then, the coefficients a{u) and 61 (w), . . . , bm{u) arc estimated by 

(a(u),6i(it),...,6™('u)) = arg min E„[p„(yi - a - X^^li^i^i)]- (4) 

a,bi,...,bm 

The resulting estimator for the slope function (t, u) i— > b{t, u) is given by 

m 

b: {t,u)^b{t,u), b{t,u) ^^bjiu)4>jit), t e [0,l],u eU. 

The optimization problem (4) can be transformed into a linear programming 
problem and can be solved by using standard statistical softwares. Once the 
estimator for the slope function is obtained, the conditional u-quantile of Y 
given X = X for a given function x = (a:^(i))t(E[o.i] € £2(0, 1] is estimated by a 
plug-in method: 

QY\x{u\x)=a{u)+ [ b{t,u){x{t) - X{t))dt. 
Jo 

Empirical choice of the cut-off level will be discussed in Section 4. 

The basis {'/>j}^i is called the (population) PCA basis. Alternatively, one 
may use other basis functions independent of the data, such as Fourier and 
Wavelet bases, in which case the analysis becomes more tractable. A potential 
drawback of using such basis functions is that, as discussed in [15], using the 
"first" m basis functions is less motivated. The PCA basis is a benchmark basis 
in functional data analysis, which is the reason why the PCA basis is used 
in this paper. Other estimation methods such as smoothing splines [14] and a 
reproducing kernel Hilbert space approach [39] could be adapted in the quantile 
regression case, which is left as a future topic. 

The interpolation rule used here may be replaced by any other reasonable 
interpolation rule. For example, a plausible alternative is to use 

^„.d(^) ^ j2 ^M±^(!li±i)i(t e IM,^,)), z = 1, . . . , n. 

It is not hard to see that the theory below also applies to this interpolation rule. 
In practice, this interpolation rule may be more recommended since it uses all 
the discrete observations Xi{tii), . . . , Xi{ti^Li+i)- 

2.3. Connection to nonlinear ill-posed inverse problems 

For any fixed u d U, our estimator b{-,u) can be understood as a regularized 
solution to an empirical version of a nonlinear inverse problem that corresponds 
to the "normal equation" : 

^(w,6(-,u))=0, (5) 
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where the map A : U x L2[0,l] ^ L2[0, 1] is defined by 

Aiu, g){-)= E[{u - 1{Y < J^g{t)X'^{t)dt)}X'=i-)] 

= n{u - FYix{Jo9it)X'it)dt I X)}X-i-)], ueU,ge Lap, 1]. 

Here, FY\x{y\X) denotes the conditional distribution function of Y given X. 
For the salce of simplicity, we have ignored the constant term. Observe that for 
any fixed u ^U, the map A{u, ■) : L2[0, 1] -^2(0, 1] is a nonlinear operator. In 
fact, using an approximation X^ w '^JLi ^ij'Pj =• -^fj our estimator b{-,u) is 
an approximate solution to an empirical version of (5) over the linear subspace 
spanned hy {(pi, ... ,4>,n}- 

wO, (6) 
where the map A : U x L2[0,l] ^ ^2(0, 1] is defined by 

= Er,[{u-m < Io9it)X^it)dt)}X^i-)], ueU,ge i2[0,l]. 
To see (6), observe that 

The first order condition to (4) implies that 

E„[{u - < ET=A^kbkiu))}i,J] « 0, 1 < J < m, 

which leads to (6) [the discussion here is informal to give an intuition behind 
our estimator]. Note that solving (4) is computationally more appealing than 
directly searching a solution to (6) as the former problem is convex while the 
latter is not. 

Meanwhile, as long as the map y i—> FY\x{y\x) is continuous, for any fixed 
u G hi, the nonlinear inverse problem (5) is locally ill-posed at b{-,u) in the 
sense of Hofmann and Scherzer [24, Definition 1.1], i.e., there exists a sequence 
of functions {gn} in a neighborhood of b{-, u) (in L2[0, 1]) such that A{u,gn) — S> 
A{u,b{-,u)) but gn i>{-,u) in the ia-norm. To see this, take a sequence of 
functions {.gn} in a neighborhood of b{-, u) such that gn — > b{-, u) but gn b{-, u) 
in the La-norm, where A means the weak convergence in L2[0, 1]. Then, by the 
weak convergence, we have 

/ gn{t)X%t)dt^ f b{t,u)X''{t)dt. (7) 
Jo Jo 

By the continuity of the map y h-j- Fy^xivlX), (7) implies that A{u,gn) — )• 
A{u,b{-,u)) despite gn b{-,u). This suggests that any sensible estimation 
procedure based on the normal equation (5) has to involve some regularizations. 
In our case, the regularization is done by restricting the parameter space for 
to a sequence of finite dimensional subspaces, where the cut-off level m 
plays a role of regularization parameter. 
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2.4- Monotonization 

Suppose in this section that U is a. bounded closed interval: h( = [ul,uu] with 
< ul < ujj < 1. The conditional quantile function Qy\x{''J- I is mono- 
tonically nondecreasing in u. However, the plug-in estimator Qy\x{'>^ I con- 
structed is not necessarily so. To circumvent this problem, we may monotonize 
the map u i— > Qyixiu I 2;) by one of the following three methods: (i) rearrange- 
ment [13] (ii) isotonization [1] (iii) convex combination of (i) and (ii). Such meth- 
ods are explained in [13] in a general setup. By [13], it is shown that any mono- 
tonized estimate is at least as good as the initial estimate (5y|x('" I in the 
following sense: let Qy|^(u | x) be any monotonized estimate for Qy\x{u \ x) 
given above. Then, we have for all g S [1, 00], 



\Qy\x('^ I 



;y\x 



{u I x)\''du 



1/9 



< 



\Qy\x{u I x) - Qvixiu I x)\'^du 



1/9 



(8) 



where an obvious modification is made when q — 00. 



3. Rates of convergence 

In this section, we derive rates of convergence for the estimators defined in the 
previous section, and argue their optimality. We make the following assumptions. 
Let Ci > 1 be some sufficiently large constant. First of all, we assume: 

(Al) {{Y,,X,)}°Zi is i.i.d. with (y,X). 

(A2) J^E[X^{t)]dt < Ci and < Cik] for aU j > 1. 

The i.i.d. assumption is conventional. It is beyond the scope of the paper to 
extend the theory to dependent data. Note that [25] discussed weakly dependent 
functional data. Assumption (A2) is a mild moment restriction. 

(A3) For some a > 1, Cf < Kj < Cij~" and Kj - Kj+i > C^^j-"-^ for 
all j > 1. 

(A4) For some /3 > a/2 + 1, sup„gj^ \bj{u)\ < Cij~^ for all j > 1. 

(A5) Let FY\x{y\X) denote the conditional distribution function of Y given 
X. Then, the map y i— > Fy^xiyl^) is twice continuously differentiable 
with fY\xiy\X) = dFY\x{y\X)/dy and f^^^^x) = dfY\x{y\X)/dy. 
Furthermore, fY\x{y\X) V \f^\x(y\X)\ < Ci. 

(A6) Mueu fY\x{QY\x{u \ X)\X) > 

Assumptions (A3) and (A4) are adapted from (3.2) and (3.3) of [22]. In as- 
sumption (A3), a measures the smoothness of the covariance kernel K, which 
also measures the difficulty of estimating the slope function (i, u) 1— >■ 6(i, u). The 
second part of assumption (A3) is to require the spacings among K,j not be too 
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small, which ensures identifiability of eigenfunctions (f)j and thereby sufRcient 
estimation accuracy of <f>j. Assumption (A4) determines the "smoothness" of 
the function t b{t,u). The condition that (3 > a/2 + 1 requires the function 
t I— >■ b(t, u) to be sufficiently smooth relative to K uniformly in u € U. See 
Hall and Horowitz [22, p. 74] for some related discussions on these assumptions. 
Assumptions (A5) and (A6) are specific to the quantile regression case. Both as- 
sumptions are standard in the quantile regression literature when X is a vector. 
Assumption (A6) ensures sufficient identifiability of the conditional u-quantilcs 
for u &U. 

(A7) For each i = 1, . . . , n, is observed only at discrete points ~ tu < ti2 < 
■ ■ ■ < Ux^+i = 1- Define A = A„ = maxi<j<„ maxi<(<L, (t^j+i - tu). 
Then, A — > as n — > oo. 

(A8) There exists a constant 7 € (0, 2] such that K{t, t) - 2K{s, t) + K{s, s) < 
Ci{t - s)^ for ah s, t e [0, 1] with s <t. 

Assumptions (A7) and (A8) are a set of sampling assumptions on X,;. A 
rate restriction will be imposed on A. Assumption (A7) in particular requires 
mini<i<„ Li — > 00 as n — >■ 00, which means that each set of discrete points 
til, - ■ ■ has to be dense in [0, 1] as the sample size grows. Assumption 

(A8) is an additional assumption on the smoothness of the covariance kernel. 
For example, 7 = 1 if K{s,t) is Lipschitz continuous and 7 = 2 if K{s,t) 
is twice continuously differentiable. The value of 7 controls the discretization 
error. Note that possible values of 7 depend on the value of a. Typically, if a 
is sufficiently large, (A8) is satisfied with 7 = 2. Assumption (A8) is similar 
in spirit to (A2) of [14], in which they directly assumed some smoothness of 
the random function t i— > X{t) to deal with the discretization error (roughly 
speaking, their 2k corresponds to our 7). 

Let J" = J"(Ci, a, /3, 7) denote the set of all distributions of {Y, X) compatible 
with assumptions (A2)-(A6) and (A8) for a given (admissible) values of Ci, a, /3 
and 7 (such that ^ 0). The following theorem, which will be proved in Section 
5 below, establishes rates of convergence for the slope estimator {t, u) 1— )• b{t, u). 

Theorem 1. Suppose that assumptions (Al)-(A8) are satisfied. Take m x 
j^i/(a+2^)^ T/iera, we have 



lim lim sup sup ¥p 



sup 
ueu Jo 



{bit,u) - b{t,u)fdt > M„-(2/3-l)/(a+2/3) 



provided that (?? V (logn)TO'^""'"'^)A''' = 0(1) as n 



^0, 
(9) 



Inspection of the proof of Theorem 1 shows that, if X were continuously 
observable, under assumptions (A1)-(A6), the rate of convergence of the esti- 
mator based on the direct empirical covariance kernel will be n"'-^''"^-'/^""'"^^^. 
The side condition that [n V (logn)TO'^""'"'^)A''' = 0(1) is assumed to make the 
discretization error negligible. This condition seems not quite restrictive. For 
example, if /3 > a-t-3/2 and 7 = 2, it is satisfied as long as A = 0((n log n)^/^), 
which seems to be mild in view of some applications in functional data analysis. 
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The following theorem, which will be proved in Section 6, establishes rates 
of convergence for I x)- For the notational convenience, define 



£{Qy\x,u) = J {Qy\x(u I x) - Qy\x{u I x)ydPx{x), 

where Px denotes the distribution of X (defined on D[0, 1]). 

Theorem 2. Suppose that assumptions (Al)-(A8) are satisfied. Take m 
^i/{a+2p) ^ r/ien, we have 



lim lim sup sup Pi? 

J\/->oo „_j.oo Fi^T 



^nY>£{QY\x,n) > Af„-(-+2/J-i)/("+2« 



0, (fO) 



provided that (?? V (log n)TO'^"+'^)A''' = 0(1) as n 



For monotonized estimators, the following corollary directly follows in view 
of (8) and Theorem 2. 

Corollary 1. Let U be a bounded closed interval in (0, 1). Suppose that all the 
assumptions of Theorem 2 are satisfied. Let (5y|x('^ I ''') '^'^y fnonotonized 
estimator for Qy\x{'^ I ^) given in Section 2.4. Then, we have 



lim lim sup sup P_F 



£{Ql.^^,u)du > Mn-("+2/3-i)/(a+2/3) 



= 0. 



Here, note that the rate n ("+2/3 i)/("+2^*) attained in estimating QyixIu \ 
x) is faster than the rate 7i~(2^i-i)/(Q+2/3) ^^^^^^[^q^ estimating b{t,u). 
In what follows, we discuss optimality of these rates. 

Proposition 1. Suppose that assumptions (Al)-(A6) and (A8) are satisfied. 
Let 7 be such that 

< a — 1, if a < 3, 



< 7 ■, 

'[<2, ifa>3. 

Then, there exists a constant Af > such that for T = T(C\, a, /3, 7), 



(11) 



lim inf inf sup Pi? 

7l-!-OC t F^T 



sup / \h(t,u) - b(t,u)Ydt > A.f„-(2/3-l)/(a+2/3) 
u£U Jo 



>0, 



where inff, is taken over all estimators for the slope function (t, u) 1— > b{t, u) 
based on {Yi,Xi), . . . , {Y„,Xn). Similarly, there exists a constant M > .such 
that for J- = J-{Ci, a, 7), 



lim inf inf sup P^? 



sn^£{QY\x,u) > 
ueu 



>0, 



where iufg^^^ is taken over all estimators for the conditional quantile function 
Qy\x '■ {u,x) H' Qy\x{^ I ^) based on {Yi,Xi), . . . , (y„,X„). In case ofU being 
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a bounded closed interval in (0, 1), there exists a constant AI > such that for 
T = J'(Ci,a,/3,7), 



lim inf inf sup P f 



E{QY\x,u)du > Mn 



'(a+2P~l)/{a+2l3} 



> 0, 



where the previous convention applies. 

The side condition (11) is a compatibility condition between assumptions 
(A3) and (A8). It is not addressed here whether this condition is tight. How- 
ever, some restriction between a and 7 is required in estabhshing lower bounds 
of rates of convergence to guarantee that the class J^(Ci, a, /?, 7) is at least 
nonempty. Proposition 1 shows that under this side condition the rates estab- 
lished in Theorems 1, 2 and Corollary 1 are indeed optimal in the minimax 
sense. A proof of Proposition 1 is given in Appendix A. 



4. Empirical choice of the cut-off level 



In this section, we suggest three criteria to choose m, and investigate their per- 
formance by simulations. We use a heuristic reasoning to derive selection criteria. 
Suppose that U is a singleton: U = {u}- Suppose that there is no truncation bias, 
i.e., b(t,u) = J2'jLi t'jiu)4>j{t) and QY\xiu\X) = a{u)+J2]Li i>j{'^)^i- Then, the 
infeasible estimator (a(u), 61 (u), . . . , bmiu))' defined by (3) can be regarded as a 
(conditional) maximum likelihood estimator when the conditional distribution 
of Y given X has the asymmetric Laplace density of the form: 



f{y\X,u,a) = 



u{l — u) 



1 



exp <^ --Pu{y - a{u) - E^li^jHO) t > 



where ct > is a scale parameter. This suggests the following analogues of AIC 
and BIC in the present context: 



AIC(u) = log 
BIC(u) = log 



1 " 

1=1 



i=l 



(m+l) 



(m + 1) logn 



See also Koenker [28, Section 4.9.1] for some related discussion. According to 
[38], we may also consider an analogue of GACV as follows: 



GACV(w) 



Er=i pujYt " - Ej=i^j(^)4) 

n — (m + 1) 



In case oiU being a bounded closed interval, define the integrated- AIC, BIC, 
and GACV as follows: 

lAIC = / klC{u)du, IBIC = / BIC(w)du, IGACV = / GACV(w)du. 
Ju Ju Ju 
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In case of U being a set of finite grid points, each integral is replaced by the 
summation over the grid points. 

We carried out a small Monte Carlo study to investigate the finite sample 
performance of these criteria. In all cases, the number of Monte Carlo repetitions 
was 1,000. The numerical results obtained in this section were carried out by 
using the matrix language Ox [16]. The Ox code for solving quantile regression 
problems supplied on Professor Koenker's website was used. See also [30] for 
some computational aspects of quantile regression problems. 

The simulation design is described as follows: 

Y= f g(t)X{t)dt + e, 







50 



g{t) = Q, = 0.3, gj = 4(-iy+ir',J > 2, = 2^^^ cos{jnt), 

50 

X{t) = Y,7jZ,Mt)^ Ij = i-iy^'r"^', " e {1.1,2}, Z, ^ U[-3'/^3'^% 
£ - iV(0, 1) or Cauchy, n e {100, 200, 500}. 

Each Xi is observed at 201 equally spaced grid points on [0, 1]. In this design, 
we have ^ 

Qy\x{u I X) = F^\u) + f g{t)X{t)dt, 

Jo 

where Ff^{-) is the quantile function of e. Thus, a{u) = F^^{u) and b{t,u) = 
g{t) {b{t, u) is independent of u). We considered two cases for U: {a) U = {0.5} 
and (b) U = {0.15, 0.2, . . . , 0.85}. In each case, the performance was measured 

by 



QA-MISE = 5—— V E 

Card ^ 

QA-MISE = ] V E 

Card M 



1 

2 



or. 



{b{t,u) ~ b{t,u)Ydt 
{Qy\x{u I x) - Qy\x{u I x)YdPx{x) 



where Px denotes the distribution of X and QA-MISE is the abbreviation of 
"quantile-averaged mean integrated squared error" . 

The simulation results for case (a) are summarized in Figures 1-4. Figures 
1 and 2 show the performance of the selection criteria for the normal error 
case, while Figures 3 and 4 show that for the Cauchy error case. In each figure, 
"Fixed" refers to the performance of the estimator with fixed m. In the normal 
error case, BIC worked better than other two criteria. On the other hand, in the 
Cauchy error case, AIC and GACV worked better than BIC. Looking closely 
at these figures, one finds that AIC and GACV performed quite badly in some 
cases (see the bottom half in Figure 1). Although none of these criteria clearly 
dominated the others, BIC worked relatively stably. These figures also show that 
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as a increases from 1.1 to 2, the performance of 6(-,0.5) becomes worse, while 
that of Qy|x(0-5 I x) becomes better. This is consistent with the theoretical 
results in the previous section. Essentially similar comments apply to case (b), 
Figures 5-8. 

5. Proof of Theorem 1 

We divide the proof into three subsections. Some technical results are proved in 
Appendix B. To avoid the notational complication, uniformity in _F g will be 
suppressed. Let C > denote a generic constant of which the value may change 
from line to line. In most cases, qualification "almost surely" will be suppressed. 
In some parts of the proofs, we use empirical process techniques. We follow the 
basic notation used in [36]. 

5.1. Reduction of the problem 

Let &o(w) = a{u) and = iio = 1- For any Vq,-.-, v„, write v"' ^ {vq, . . . , v^)' . 
For e ]R™+i and e M'"+i, write • = X;jlo ^j^j- Then, 

b"'{u) = {bo{u), b,{u), b,Ju)y - arg min E„[p„(K, - 1™ ' 

We use a further re-parameterization. Let rjij ~ Kj ^^^^ij, f/ij = Kj ^^^iij, djiu) = 

K'J'^bj{u) anddj(M) = K^J'^bj{u). Note that £[77^] = 0,E[77fj-] = 1, and £[77^77^^-] = 
for all j ^ k. Then, 

d"(7/) = (do(u), . . . , dM)' = arg min E„[p„(K, - f?^ • d™)]. 

We first consider to bound sup^^^j,^ — d"^{u)\\(^2. 

Lemma 1. Suppose that for all e > 0, there exist constants c > and M > 
possibly depending on e such that 

liminf pJ - En[{u - l(y, < fiY" ■ (d'^iu) + M ^/^h"'))}{h"' ■ t}™)] 

n—^oo I 

> cy/^^, Vu e U,yh'^ e §'"! > 1 - e. 

Then, we have 

limsupP (sup \\d"\u) - d"'{u)\\f2 > M^m/n\ < e. 

Proof. The proof is divided into three steps. 

Step 1: ||E„[{u-l(y, <77™-d"(M))}77™]||p < ((m+ l)/n) maxi<,<„ Hj^rlU^- 
The proof is based on the next lemma. 
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Fig 1. Performance of selection criteria. Case (a). Estimation o/&(-,0.5). 



n=1 00,alpha=1 .1 , normal error n=200,alpha=1 .1 ,normai error n=500,alpha=1 .1 , normal error 




2 4 6 8 10 12 14 2 4 6 8 10 12 14 2 4 6 8 10 12 14 



n=100,alpha=2,normal error n=200,alpha=2, normal error n=500,alpha=2, normal error 




2 4 6 8 10 12 14 2 4 6 8 10 12 14 2 4 6 8 10 12 14 



Fig 2. Performance of selection criteria. Case (a). Estimation of Qy\x{^-^ I 
n=100,alpha=1.1, normal error n=200,alpha=1.1,normal error n=500,alpha=1.1,normal error 




2 4 6 8 10 12 14 2 4 6 8 10 12 14 2 4 6 8 10 12 14 



n=100,alpha=2,normal error n=200,alpha=2,normal error n=500,alpha=2, normal error 




2 4 6 8 10 12 14 2 4 6 8 10 12 14 2 4 6 8 10 12 14 
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Fig 3. Performance of selection criteria. Case (a). Estimation o/6(-,0.5). 



n=1 00,alpha=1 .1 ,Cauchy error n=200,alpha='l .1 ,Cauchy error n=500,alpha=1 .1 ,Cauchy error 




2 4 6 8 10 12 14 2 4 6 8 10 12 14 2 4 6 8 10 12 14 



n=100,alpha=2, Cauchy error n=200,alpha=2,Cauchy error n=500,alpha=2,Cauchy error 




2 4 6 8 10 12 14 2 4 6 8 10 12 14 2 4 6 8 10 12 14 



Fig 4. Performance of selection criteria. Case (a). Estimation | x). 



n=1 00,alpha=1 .1 ,Cauchy error n=200,alpha=1 .1 , Cauchy error n=500,alpha=1 .1 ,Cauchy error 




2 4 6 8 10 12 14 2 4 6 8 10 12 14 2 4 6 8 10 12 14 



n=100,alpha=2,Cauchy error n=200,alpha=2,Cauchy error n=500,alpha=2,Cauchy error 




2 4 6 8 10 12 14 2 4 6 8 10 12 14 2 4 6 8 10 12 14 
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Fig 5. Performance of selection criteria. Case (h). Estimation of (t^u) h^b(t,u). 



n=1 00,alpha=1 .1 , normal error n=200,alpha=1 .1 ,normai error n=500,alpha=1 .1 , normal error 




2 4 6 8 10 12 14 2 4 6 a 10 12 14 2 4 6 8 10 12 14 



n=100,alpha=2,normal error n=200,alpha=2, normal error n=500,alpha=2, normal error 




2 4 6 8 10 12 14 2 4 6 8 10 12 14 2 4 6 8 10 12 14 



Fig 6. Performance of selection criteria. Case (h). Estimation of Qy\x{^ I 



n=100,alpha=1.1, normal error n=200,alpha=1.1,normal error n=500,alpha=1.1,normal error 




2 4 6 8 10 12 14 2 4 6 8 10 12 14 2 4 6 8 10 12 14 



n=100,alpha=2,normal error n=200,alpha=2,normal error n=500,alpha=2, normal error 




2 4 6 8 10 12 14 2 4 6 8 10 12 14 2 4 6 8 10 12 14 
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Fig 7. Performance of selection criteria. Case (h). Estimation of (t^u) h^b(t,u). 
n=1 00,alpha=1 .1 ,Cauchy error n=200,alpha=1 .1 ,Cauchy error n=500,a]pha=1 .1 ,Cauchy error 




2 4 6 8 10 12 14 2 4 6 a 10 12 14 2 4 6 8 10 12 14 



n=100,alpha=2, Cauchy error n=200,alpha=2,Cauchy error n=500,alpha=2,Cauchy error 




2 4 6 8 10 12 14 2 4 6 8 10 12 14 2 4 6 8 10 12 14 



Fig 8. Performance of selection criteria. Case (h). Estimation of Qy\x{'^ I 

n=1 00,alpha=1 .1 ,Cauchy error n=200,alpha=1 .1 , Cauchy error n=500,alpha=1 .1 ,Cauchy error 




2 4 6 8 10 12 14 2 4 6 8 10 12 14 2 4 6 8 10 12 14 



n=100,alpha=2,Cauchy error n=200,alpha=2,Cauchy error n=500,alpha=2,Cauchy error 




2 4 6 8 10 12 14 2 4 6 8 10 12 14 2 4 6 8 10 12 14 
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Lemma 2. Let {(j/i, zO'}r=i ^'^ sequence of pairs of non- stochastic variables 
{Ui, z'^y where yi E R and Zi G M.^ . Pick any u E (0, 1). Let d{u) £ M'^ be any 
solution to the minimization problem 

min En[pu{yi - z-d)]. 

Then, we have 

m„[{u~l{y,<zldiu))}z,]\U2 

< n^"'^ Card({i G {1, . . . , n} : j/i = Zj'd(u)}) max ||zi||£2. 

l<i<n 

Proof of Lemma 2. Follows from a small modification of El- Attar et al. [17, 
Lemma 2.1]. □ 

Recall that ry™ depends only on := {Xi, . . . , X„} and not on Yi, . . . , F„. 
Since the conditional distribution of Yi , . . . , y„ given X" is absolutely continu- 
ous, by Sard's theorem, sup^g^^ Card({i e {1, . . .n} -.Yi — fjl"- ■ < m + 1 
almost surely To be more precise, pick any subset / C {l,...,ri} such that 
Card(/) > m 4- 2. Conditional on X", consider the set 

Si {(f)™ • : (5™ e ]R™+^} C MCard(7)_ 

Then, 5/ is a linear subspace of dimension at most m + 1. Suppose that Yi = 
fj™ ■ d™{u) for alH e / for some u eU. Then, {Yi)i^j G Sj, by which we have 

¥{Y, = fir ■ dr{u)yi (^i,3u(^u\ xri < n{y^)^eI e si \ xn- (12) 

However, by Sard's theorem [see 32], the Lebesgue measure of Si in RCard(/) 
zero, and by the absolute continuity of the conditional distribution of (Yi)i^i 
given X", the right side of (12) is zero. Thus, we conclude that 

sup Card({i G {1, . . . n} : = fy™ • d™(w)}) > m + 2 \ 

< J2 l^i^' = ■ d"(?^), Vi el,3ueU\ Xi"} = 0, 

IC{l,-.-,7^} 

Card(/)>Tn+2 

by which we have sup„g^ Card({i G {1, . . . n} : = 77™-d™(M)}) < m+1 almost 
surely. Then, the conclusion of Step 1 follows from an application of Lemma 2. 



Step 2: We have 



max W^rWi- = op{(logn)-iv^}- (13) 

l<%<n 



We defer the proof of (13) to Appendix B. 
Step 3: Proof of the lemma. 
Define 



1 0, otherwise. 
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Then, by Steps 1 and 2, we have 

sup \E,,[{u - m < VT ■ d^\u))}{k^{u) ■ 7}™)]! = op(\/W^)- 

u&J 

Define the event 

En ■■= I - E„[{?i - m < VT ■ {d"-\u) + il/v/^/i"))}(/i" • ffT)] 



Since the map 

is non-decreasing for ali m G W and /i™ G S™, the event f„ is also written as 



= E„[{r/, - 1{Y, < 77- • + /vW^/i"))}!/^" • fiT)] 



Thus, as n — >■ 00, 

sup - rf"(ii)||f2 > M^/m/n 

lU J 

= P I - d™(u)||f2 > M.ym/n, 3u e 

< P - d"'{u)\\f2 > MVW^, 3u e Z^} n + P{£^,) 

< P {-E„[{ii - 1(Y, < fi'^' ■ • 1™)] > cVW«, 3u e 

+ P(f,'5) 
<o(l) + (l + o(l))e. 

This completes the proof of Lemma 1 . □ 



5.2. Verification of the hypothesis of Lemma 1 

Pick any fe '" = (/iq, /ii, . . . , /i„0' S S™. For a given A/ > l,lct (5" = (Jq, ^i, • ■ • , (^m)' = 
M^m/nh"'. Then, 

- E„[{7i - l(y. < 7)r • {d"\u) + <5'"))}(/i" . 7)™)] 
= -E„[{« - l(y, < Qy\x{u I • fiT)\ 

+ E„[{Fy|x(7}r ■ + I ^0 - Fy\x{Qy\x{u I X.) I XOK/i" • ffD] 

+ n-i/2G„|x[{i(K, < ■ + -5")) - m < qy\x{u I • ??r)] 

=: I + II + III, 
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where we have used the fact that FyixiGvixiu I X) I X) — u and 

n-''^Q,,\x[{m < fiT ■ {d-\u) + 5™)) - 1{Y, < Qy\x{u \ • r},™)] 

:= IE„[{l(r, < 77™ • {d"\u) + 5")) - < Qy\x{u \ X,)) 
- Fy^xivT ■ {<r{u) + I X,) + I X,) \ X,)}{h^ ■ C)]. 

We separately bound the terms /, // and /// uniformly in m e and /i™ G S™. 
In what follows, stochastic orders are interpreted independent of M. Note that 



fir ■id'''iu) + S"') 



;y\x 



{u I X,) 



3=0 



3=0 



Bounding /: observe that 

/ > -||E„[{« - < Qy\x{u I X.)}rr]ll^^- 
Using the relation 

1{Y; < Qy\x{u I XO) = l(J^y|x(i^»|^^) < ^i) 

= l(C/» < with [/, - ^^y|x(F,|X,), 

we have 



(14) 



sup||E„[{M- < QYixin \ X,))}r)™] ||,2 

< .teo sup (IE„[{^ - 1(C/. < 

Here, Ui, . . . ,Un are independent uniform random variables on (0, 1) indepen- 
dent of Xi := {Xi, . . . , X„}. Pick any < j < m. Let di, . . . , cr„ be indepen- 
dent Rademacher random variables independent of (f/i, ^1), . . . ,{Un, Xn)- Since 
J7i,..., C/n are independent from X", applying the symmetrization inequality 
[see Lemma 2.3.1 of 36] conditional on X", we have 



E 



sup (E„[{ii - 1{U, < u)}fi,j]y I XI' 

ueu 



< 4E 



sup {E„ [adiU, < u)fi,j]y \ X^ 
ueu 



We make use of Proposition 3 in Appendix C to bound the right side. Consider 
the class of functions 



^ = {K X M 9 (y, z) l{y <u)z -.uE U). 
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Then, we have 
E 



sup (E„[aa([/, <u)%,])" |xr 

ueu 



E 



sup (E„[a,.g(C/„ | 



It is standard to see that C/ is a VC subgraph class with VC index < 3. Thus, 
by Theorem 2.6.7 of [36], there exist universal constants A > e and W > 1 such 
that, for envelope function G{y,z) = \z\^ 

N{e\\G\\L,(P„),g,L2{Pn)) < (A/e)^, < Ve < 1, 

where P„ denotes the empirical distribution on M x M that assigns probability 
to each {Ui^rjij), i = l,...,n. Therefore, by Proposition 3, we conclude 

that 



E 



sup(E„[(7a(c/, <M)7fe-])' |xr 

ueu 



where Z? is a universal constant. Since < j < m is arbitrary, we have 

sup \\E,,[{u~m < Qy\x{u I xMvnwp = opm„[\\fir\\%])'^'n-'/^}. 

ueu 

We shall show in Appendix B that 

E4\\fir\\l]^Op{m), (15) 

by which we have 

sup |1E„[{m - < Qy\x{u I X,))}fir]\\p = Op(y^). 
ueu 

Bounding //: by Taylor's theorem, we have 
Fy\x{Qy\x{u \X)+y\X)- FrixiQYlxiu \ X) \ X) 

= fvixiQvixiu \X)\X)y+^J^ fi^^xiQvixiu \ X) + 9y \ - e)de 

2 

^■.fY\xiQY\x{u\X)\X)y+^R{u,y,X), 
by which we have, using (14), 

// = E4fY\x{QY\xiu I Xi) I x,){f,r ■ 5" + ■ O] 

> M^^E4fYix{QY\x{u I X,) I X,)(/^™ • vD'] 

CE„[|f,(^)(/i'" . f,r)\] CE^ivT ■ 5™ + niu)f\h^ ■ fir\] 

> M^7^E4fY\x{QY\x{u I X,) I X,)(/^™ • r}.™)'] 



C(E„[ff(u)])i/2(E„[(/i™.^™)2])V2 



CAf^(m/n)( max ll£OIE«[(/^" ' 'yD'^ 

l<'i<?l 

C(max \\,r\\,.)E„[fUu)], 

1<1<71 



(16) 
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where we have used the fact that fY\x{v\X) V < C. By assumption 

(A5), there exists a smaU constant ci > such that 

C^n[fY\x{QY\x{u I • C)'] > ciE„[(/i™ • f,T)% yu e U, W e S™. 

We shaU show in Appendix B that 



sup |E„[(/j".f7r)1-lHop(l), 
supE„[fj^(M)] = Op{mn^^). 

u&A 



(17) 
(18) 



Thus, by (13), (15), (17) and (18), we have 



(16) > ciMy/m/n{l - op(l)) - Op{y/m/n) - M'^op{.Jm/n), 

where the stochastic orders are evaluated uniformly in u eU and /i™ G S™ . 

Bounding ///: Let cti, . . . , ct„ be independent Rademacher random variables 
independent of the data (Yi, Xi), . . . , (F„, X„). Applying the symmetrization 
inequality conditional on X" :~ {Xi, . . . , X„}, we have 



E 



sup \n-^/^Gr,ix[{m < fir ■ id'^i^) + -5")) 



m<QY\xiu\xMih"'-vr)]\\x^ 



< 2E 



sup |E„[a,{l(y; < • (d'^iu) + (5™)) 



m<QYix{u\xmh"'-fiT)]\\x^ 



(19) 



where 5™ is taken as (5™ = My^m/nh™ in the suprema. Note that the sym- 
metrization inequality is applicable since the regular conditional distribution of 
(Yi, . . . , Yn)' given X" exists and conditional on X", Yi, . . . , y„ are independent. 
Consider the class of functions 



e = <^ R X ^[0, 1] X K"+i 9 {y, X, r/™) k> {l{y < jy"^ • (^"(7/) + 5™)) 



Then, we have 



(19) = 2E 



sup|E„[a,.g(y.,X.,77r)]| l^r 

see 
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We apply Proposition 2 in Appendix C to bound the right side. Note that 
{Xi, r)^) are measurable with respect to the cr-field generated by X", the regular 
conditional distribution of (Yi, . . . , Yn)' given X" exists, and conditional on X", 
Yi, . . . , F„ are independent. Observe that 



sup|.9(y„X„C)l < max \\f,Th. S, 

gee l<l<n 



and, by (14), 



supE„[E[52(y^,X„r)r)|Xr]] 
gee 



sup E„ 



< sup 



< CM^Jm/n max ||r},™||<,2 sup E„[(/i™ • t}™)^] 

l<i<n /i^eS" 

+ C max ||7}r||,2(supE„[f2(ii)])i/2( sup E„[(/i™ • 77^2] ) 1/2 



We shall show in Appendix B that there exist some constants C2 > 1 and 
A' > Zyje such that 



(20) 



where denotes the empirical distribution on M x D[0, 1] x M™+^ that assigns 



probability n ^ to each (Yi ,Xi,fil' 
we conclude that 



1, . . . , n. Therefore, by Proposition 2, 



E 



sup|E„[a,g(y„X„77r)]l 



<1{t> Q)D' 



C2mf2 / A'B C2mB , A' B 



'loa 



■log- 



(21) 



provided that t < B, where D' is a universal constant. 
By (13), (15) and (17), and the fact that M > 1, we have 

B ^ o p {{log ny^y/njm}, = A/op{(log77,)"^}, 

and there exists a small constant C3 > such that with probability approaching 
one 

t'^ > CzB\/ m/n. 
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Thus, replacing f by f A i? if necessary, (21) = M^^^op{y^jn/n), by which we 

conclude that 

/// > -M^^^opi^/m/n), 

where the stochastic order is evaluated uniformly in u G U and h™ E §™. 
Taking these together, we now conclude that 

I + 11 + III > ciMy/m/7i{l - op(l)) - Op{^m/n) - M^op{^m/n), 

where the stochastic orders are evaluated uniformly in u E U and h™ G §™. 
This immediately implies the hypothesis of Lemma 1. 

5.3. Completion of the proof 

We have shown that 



sup - d"'{u)\\e2 = Op(y^), 

which immediately implies that 

sup |16"(u) - 6"(u)||2, = Opin-^^nm-^) = Opim"+^n-^) 

= Op(n-(2/J-l)/("+2/3)), 

Observe that 

b{t, u) — b{t, u) 

m m m 

j=i i=i j=i 

m GO 

m ?n oo 

j = l j = l j=m+l 

SO that, uniformly in u Cz U, 

1 

{b{t,u) - b{t,u)fdt 

m oo 

<3||6™H-&"H||,2.+3™^fe,2(^i)||0,~^,|P + 3 ^ 
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By the proof of (18) in Appendix B, we see that 

m m 

j=i i=i 

= Op(n-(2/3-l)/("+2/3)). 

This completes the proof of Theorem 1. □ 
6. Proof of Theorem 2 

Without loss of generality, we may assume that E[X(t)] ~ for all t G [0, 1]. Let 
Xn+i be a copy of X independent of the data 2?„ := {(Yi,Xi), . . . , (y„,X„)}. 
Then, 

£{Qy\x, u) = E[{QY\xiu I Xn+i) - Qy\x{u I Xn+i)f \ !?„]. 
Let Xn-\-i = XI^i Observe that 

I -^n+i) = a{u) + y2bj{u)^ri+i,j + / - (j)j){t)dt 

3 = 1 J = l 

1 _ 

X{t)b{t,u)dt, and 





m oo 

Qy\x{u\ Xn+l) ^ a{u) +^bj{u)£,n+l,j + ^ b.j{u)£,n+l.j- 

j = l j=m+l 

— 1/2 

Letting i]n+ij = Hj Cn+ij i we have 

{(3f|x(w I Xn+l) - Qy\x{u I Xn+l)y 

{rn ^ r oo 

j = l J [i=m+l 

Taking expectation with respect to Xn+i, we have 



E[{QY\xiu I X„+i) - Qrixiu I ^„+i)}' I A.] < C 



\\d^-iu)-d"\u)\\i. 



K,62(^)+^' J^S,(ii)(4-0,)(t) I dt+y^' ^mt,u)dt^^ 



j—7n-\-l I J — 1 



imsart-generic ver . 2011/11/15 file : FunctionQR-rev. tex date : February 23 , 2012 



K. Kato/ Functional quantile regression 27 

By the previous proof, we have 
sup - d"\u)\\% = Opim/n) = Op(„-('^+2/3-i)/(a+2/3)-)^ 

j—rn-]-l J— m+l 

Observe that 

771 

{m 

m 



by which we have 



I' ^Yb,{u){4>, - <l>,){t)^ dt 



By the previous proof, we see that 

m m 

m^62(^)||^_^^.||2 < C7m^r'^ll0j = Op{mn-') 

= Op(n-("+2/3-l)/(a+2/J))^ 

while by the proof of (15), we have X^Jli '^J^\\4'j ~ <?^ilP — Thus, we 

conclude that 



sup / J V - 0,)(<) I dt = Op(n-(" 

«GW Jo I I 



+2/J-l)/(a+' 



2/3))^ 



Finally, we have 

X{t)b{t,u)dt\ <l X^{t)dt I b^{t,u)dt = Op{n-^)xOp{l)=Op{n-^), 
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uniformly in u Cz U. Taking these together, we conclude that 

supE[{Qy|x(w I - Qrixiu \ X„+i)}2 | I?„] = Op(n-("+2^-i)/("+2«). 

ueu 

This completes the proof. □ 
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Appendix A: Proof of Proposition 1 

Consider the same construction as in [22]. Let 0i(i) = 1 and (t>j{t) = 2^/^ cos(j7r<) 
for j > 1. Put Qj = Ojj-^ for [„i/("+2/3)] + l < j < 2[rii/("+2^)] and = 
otherwise where [y] denotes the integer part of ?/ £ R and each 6j is either 
or 1. Let Zi,Z2,--- - [/[-S^/^^ 3^/^] jj j_ rp^^e X{t) = J'^^^^i'/'i W 

and g{t) ~ X^jiTfnVCa+sijj+i £'i'/'i(0- Consider a sequence of data generating 
processes 

F= / Q(t)X(t)dt^e= ^ 6ljj-("+2'5)/2^j^e, e-iV(0,l), eXX 

j=[ni/(°+2fJ)] + l 

Then, we have 



Qy\x{u I X) = a(u) + y b{t,u)X{t)dt, with 

by which one sees that assumptions (A4)-(A6) arc satisfied. Here, 0(-) and $(•) 
arc the density and the distribution function of the standard normal distribution, 
respectively. Suppose that a < 3. Then, since for any 0<7<a — 1, ii— )■ cos(t) 
is 7/2- Holder continuous (by the periodicity of the cosine function), we have 

oo 

E[{X{s) - X{m < C\s - tpJ2r"+^ < C'\s - tp, Vs,t e [0, 1], 

i=i 

where C and C" are some constants. This shows that Assumption (A8) is sat- 
isfied with < 7 < a — 1 when a < 3. For a > 3, K(s, t) is twice continuously 
difFerentiable, so that Assumption (A8) is satisfied with < 7 < 2. Finally, by 
[22], for any estimator {t, u) H> b{t, u), 

sup* sup 

u£U Jo 

> sup* / 'E[(b{t,uo) ~ g(t))^]dt (uq is any point in Z^) 

Jo 

> £)„-(2/3-l)/(a-H2/3)^ 

where sup* denotes the supremum over all 2^" ^ '''l different distributions of 
(Y, X) obtained by taking different choices of 0[„i/(o+2fi)]_)_]^, . . . , 02[ni/(a+2/3)], and 
£> > is a constant. The other assertions follow similarly. This completes the 
proof □ 

Appendix B: Proofs of (13), (15), (17), (18) and (20) 

In this section, we provide proofs of (13), (15), (17), (18) and (20) omitted in 
Section 5. Throuought the section, we assume all the coniditions of Theorem 1. 
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Without loss of generality, we may assume that E[X(t)] — for all t E [0,1]. 
Define the (infeasible) empirical covariance kernel 

K*{s,t) = E„[(X,(s) - X{s)){X,{t) - X{t))], 

where X{t) = n'^ Mt)- Let K*is,t) = T,T=i '^j$*is)$*{t) be the spec- 
tral expansion of K*{s, t) where > ^2 > • • • > and {(jilj^J^i is an orthonor- 
mal basis for i2[0, 1]. Without loss of generality, we may assume that 



> 0, y (/),(/)* > 0, vj > 1. 

Here, to ease the notation, f{t)dt is abbreviated as J f for any function 
/ : [0, 1] ^ M. Define 



i*,= J{X,-X)4>*, 



Recall that 77^^- = k- ^^^^tj = k / Xj^^ and % = Kj ^^^iij = J {Xi 

X)4)j. We will frequently use the following decomposition: for j > 1, 







= m3 - vt 














\x, - 


- X^)4>J - 








-1/2 






































A,6- 







1/2 



X((^j - 4)*) 
X{^J - 



X(j)j - / X{yj 



We prepare some lemmas. For any function R : [0, 1]^ R, define |||i?| 
(/ / R^{s,t)dsdt)^/^. Recall that = P{t)dt for any / : [0, 1] ^ M. 

Lemma 3. We have 

E„[||X, - ^ Op(A7), „ A'*|||2 ^ Op(A^). 
Furthermore, as n 00, wzt/i probability approaching one, 

- $*\\ < cj'^+^Wlk ^ k*\\\, i<vj <m. 

Proof. Observe that 

- x,{t) = Y,{x,{ta) - x,{t))i{t e [tu,t,j+,)), t e [0, i), 
1=1 
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by which we have 

111, - ^Y. I ^'^\MUi) - X,{t)fdt. 

1 = 1 •'^i' 

Taking expectation, we have 

m\X,-X,f]^Y. I ^'^'{Kit^t) - 2K{t,tu) + K{tu,tu)}dt 
1=1 

Li 

1=1 

where we have used assumption (A8). This leads to the first assertion. The 
second assertion follows from the Schwarz inequality and the first assertion. 
The third assertion needs some effort. By Bosq [5, Lemmas 4.2 and 4.3; see also 
the remark below], we have 

sup|^:;-«;,| < snpXjU, - 4>*\\ <S'/^\\\K - k*\\\, (22) 

where Xj = imii{k*_i — k*, k* — ^^j+i} for j > 2 and xi = kl — Kj. For some 
small constant c > 0, define the event 

£n = {Xj>cj-"~\l<yj <m}. 

It suffices to show that ¥{£„) 1- By the first inequality in (22), 

kI - Kfc+i > Kk - Kk+1 - 2|||A'* -i^lll > C^ifc"""^ - 2|||A'* - K\\\. 

Since > m-°'-^ x „-("+i)/("+2«^ ^ k\\\ = Op{n-^/^) (which 

follows by a simple calculation), and n^^^^ ~ o(n^^"+^)/("+^'^)) (which follows 
by /? > a/2 + 1), we have uniformly in 1 < A: < to, 

ki-ki^,>c-'{i-op{i))k-"-\ 

which leads to that P(£„) — !• 1 by taking c sufficiently small. □ 

Remark 1. Lemma 4.3 of [5] reads as follows: for functions Q, R : [0, 1]^ R 
having the spectral expansions in L2 [0, 1]^ of the form Q{s, t) ~ Y^^=i ^j'^j (^)V'j (*) 
and R{s,t) = ^JLi'^j'fj{s)'fj{t), where Ai > A2 > • • • > 0, i^i > 1^2 > 
• • ■ > 0, and and arc orthonormal bases for ^2(0, 1], we have: 

XjW'-Pj ~ ^jll < 8^/^|||i? — QUI for all j > 1 such that Xj > 0, where Xj = 
min{Aj_i — Aj, Aj — Aj+i} for j > 2 and xi = A1 — A2. Here, we have assumed that 
/ V'jV'j ^ for all j > 1- This lemma actually holds with sup^^j^ Xjll'/'j ~ V'jll ^ 
8^/^|||i? — QUI since the inequality trivially holds in case of Xj ~ 0. 

The following useful result was established in [22]. 
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34 



< 10 5] K-^fc)-' I 



Furthermore, we have 



{K* - K){s, t)(j)j{s)(j>k{t)dsdt'j , 1 < Vj < m, 



(X* - K){s,t)(j>j{s)(f>k{t)dsdt^ 



uniformly in 1 < j < m. 

Proof. Sec Hall and Horowitz [22, p.83-84]. 

B.l. Proofs of (13) and (15) 

Wc first prove (15). By Lemmas 3 and 4, we have 



□ 



^ n m [ml 

1=1 J=l [i=l 



J=lJ=l [j=l J 



Op(l) X Op(n-iE7=ir+') = Op( 



)=Op(l). 



Similarly, we have 



rn 

^A^^ = Op(n-i?7i3"+3A^) -op(l). 



^A2, = 0p(m" 



Op(l). 



i=i 



Using the decomposition Xi{t) = X^fcLi ^ik4'kit), we have 



Aj6 = 



-1/2 
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by which wc have 

m 

^ A% ^ Op{E[J:^^,A%]) = Op{j:]l,E[fj^]) - Op(mn-i) = op(l). 

Finally, by a direct calculation, we have 

^nMT] - Op(m). 

Taking these together, we obtain (15). 

We now turn to prove (13). Observe that 

m 

max V < max \\X, - X,f x Op{m"+^), 

l<i<n ^ — ^ l<?'<n 
rn 

max ^A,2,.2 < max \\X,f x Op(m3"+3A'^), 

KiXn-*^^ — ^ l<?<n 

- - j^l 

m 

max VA?-5 < max \\X,\\^ x Op(TO"+3n-^). 

l<z<n-'^ — ' l<i<ri 

- - j=l 

Since / E[X4] < C, we have 

max \\X,f = Op(ni/2), 

l<'i<n 

which leads to that 

m m 

max y A?.2 = Op(ni/2TO3"+3A^), max V A?-. = Op(m"+3n-i/2). 
Using the trivial bound maxi<i<„ \\Xi — X^jp < X]r=i 11"^' ^ ^'IPj ''^^ also have 

ni 

max y A? 1 = Opinrif+^lS?). 
Similarly, since E[rij^] = ^/^^[^^ ] < C by Assumption (A2), we have 

rn 

max ^7/2. =0p(mni/2). 
Taking these together, we have 



max 

l<i<n 



WfiTWe- = Op(nm"+i A'^ + ni/2m3"+3A'' + m°'+^n'^/^ + mn^/^). 



Since a > l,/3 > a/2 + 1 and 771^"+'^ A''' — > 0, there exists a small constant c > 
(depending on a and /3) such that the right side is Op{Ti^'^{n/m)). This implies 
(13). 

□ 
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B.2. Proofs of (17) and (18) 

We first prove (17). Observe that 

by which wc have for all /i™ e §™, 
By the proof of (15), we have 

E4\\f,r-iir\\%] = opii). 

While by Rudelson's inequality (Theorem 3 in Appendix C), we have 



E 



sup |E„[(/i'".77r)1-l| 



^Efmax ||?7™"2 



n l<i<n 



all j ^ 1 by Assumption (A2), by Lemma 5 in Appendix C, we have 



provided that the right side is smaller than 1. Since E[77^j] = Kj '^K[S^fj] < C for 



Efmax \\vr\\l] = 0{mn'^'). 

l<2<n 

Therefore, wc conclude that 

sup |E„[(/i" • r^rf] - 1| = Op(n-i/''mi/2(logn)i/2) = op(l), 

so that uniformly in /i'" G §™, 

E„[(/i™ • fir f] = E„[(/i™ • r,r)2] + op(l) + Op(l) X op(l) 
= l + op(l). 

This completes the proof of (17). 

We now turn to prove (18). Observe that 



(u) < 2{(C - O • rf" (")}' + 2 <^ J2 ('^)'^ 
Since E[?7y] = 0,E[?/| ] = 1 and E[riijriik] — for all j ^ k, we have 



E 



j—m-\-l 



j—rn-i-l J— rn+1 

oo 

j=m+l 
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By the proof of (15), we also have 

m 

Here, we have 

rn 

fO(l), if-2/? + 2a + 2 < -1, 

^^-2/3+2a+2 ^ I o{\ogn), if -2/3 + 2a + 2 = -1, 

J=i [o(m-2/3+2a+3)^ if-2/3 + 2a + 2 > -1. 

Since m-^^'+^a+s ^ ^-i„j3a+3 ^^^^ m^°'+3 „ ^hen -2/3 + 2q! + 2 = -1, we 
have 

m 

^ j-"-2'3(„-ij"+2+A^j3a+2) = 0(n-i+A'^+n-i(logn)m3"+3A'^) = O(n-i). 

Taking these together, we obtain (18). This completes the proof. □ 

B.3. Proof of (20) 

Consider the classes of functions 

= {M X D[0, 1] X M'"+i 9 {y, x, r/") ^ l{y < 7?" • (d™(u) + <5"))(/i" • 77™) 

■.ueU, h"' e §™, 5™ = il/v/^/i'"}, 

and 

^2 = {R X i?[0, 1] X M"+i 9 (2/, 7,") 1(2/ < gy|x(M I x)){h"' ■ ir) ■.ueU,h"'e §™}. 

It is relatively standard to see that CJi is a VC subgraph class with VC index 
bounded by cm for some constant c > 1 [sec 2, Lemma 18]. For O2, observe 
first that l(y < Qy\x{'u I 2^)) ~ l(-P'y|A'(2/|^) < Since FY\x{y\x) is a fixed 
function, it is also shown that G2 is a VC subgraph class with VC index bounded 
by c'ttt- for some constant c' > 1. The conclusion now follows from an application 
of Theorem 2.6.7 of [36] and a simple covering number calculation. □ 

Appendix C: Useful inequalities 

We introduce some useful inequalities. 

imsart-generic ver. 2011/11/15 file: FunctlonQR-rev.tex date: February 23, 2012 



K. Kato/ Functional quantile regression 



38 



Theorem 3 (Rudclson's (1999) inequality). Let Zi,...,Z„ be i.i.d. random 
vectors in M.^ with S := E[ZiZ[]. Then, for all k > , 



E 



1 



<max{|lS]|li/25,(52}, 5 = cji^E[max \\Z, 



2 1 

l<^<n 



where \\ ■ ||op is the operator norm and C is a universal constant. 

The expression of Theorem 3 is shghtly different from Rudelson's original 
form, but is directly deduced from his proof. Theorem 3 gives moment bounds on 
the difference between empirical and population Gram matrices in the operator 
norm. Recall that for any kx k symmetric matrix A, ||A||op = maXj,ggfe-i 
To apply Rudclson's inequality, we have to bound E[maxi<i<„ ||Zi||^2], which is 
typically implemented by using the following lemma. 

Lemma 5. Let Xi, . . . , X„ be arbitrary scalar random variables such that maxi<i<„ E[|Xi|''] < 
cxD for some r > 1. Then, we have 

E[max \X^\] < Crn^l\ 

l<i<n 

where C,. is a constant depending only on r and maxi<,;<„ E[|Xi|'']. 

For the proof, see van der Vaart and Wellner [36, Lemma 2.2.2]. 

In what follows, we introduce "conditional" maximal inequalities. Below we 
assume the class of functions to be a "pointwise measurable class" to avoid a 
measurability complication. A class of measurable functions C/ on a measurable 
space S is said to be poitwise measurable if there exists a countable class of 
measurable functions H on 5 such that for any g £ G, there exists a sequence 
{km} C H with h^{x) g{x) for ah x S. See Chapter 2.3 of [36]. This 
condition is satisfied in our application. 

Proposition 2. Let (fi,^, P) denote the underlying probability space. LetV be 
a sub a-field of A. Let {{ui,Vi)}f^-^ be a sequence of random variables taking 
values in some measurable space S such that ui, . . . ,u„ are V -measurable, the 
regular conditional distribution o/ (ui, . . . , ii„) given D exists, and conditional 
on D , ui, . . . ,Un are independent. Let Q be a pointwise measurable class of 
functions on S such that for some D -measurable random variables B and f , 

(i) s.wp\g{ui,Vi)\ < B, 1 < Vi < n, 
sea 

(ii) supE„[E[g2(u„w,) I 2?]] < f2, 
geg 

(Hi) T < B, 

almost surely. Suppose that there exist constants A > and W > 1 such that 
N{Be,g,L2{Pn)) < {A/e)'^, < Ve < 1, (23) 
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where P„ denotes the empirical distribution on S that assigns probability n^^ to 
each {ui,Vi), i = 1, . . . ,n. Let ai, . . . ,ct„ be independent Rademacher random 
variables defined on another probability space. Extend the underlying probability 
space by the product probability space. Then, we have 



E 



sup\En[a.ig{ui,Vi)]\ \ V 
gee 



< l(f > 0)D 



If^W AB WB AB 

log H log — 

n T n T 



where D is a universal constant. 



Proposition 2 is a conditional version of Proposition 2.1 of [20]. See also The- 
orem 3.1 of [21]. Here, {(uj, i;,)}"^]^ are not necessarily independent. However, 
conditional on 2?, are independent. 

Proof of Proposition 2. The proof is a modification of that of Ginc and Guillou 
[20, Proposition 2.1]. For the sake of completeness, we provide the full proof. 
Suppose that E[supggg; E„[(7^(ui, iij)] | P] A f > 0. Otherwise the conclusion 
follows trivially. By Dudley's inequality [see 36, Corollary 2.2.8], we have 



E 



sup|\/^IE„[o-i.g(ui,Ui)]| I 
see 



< D / ^\ogN{e,g,L2{Pn))de, 



where 6 (sup^gg E„[g^(wi, w^)])^/^ and Z? is a universal constant. Suppose 
that 9 > 0. Using changes of variables, we have 



/ ^\ogN{e,g,L2{Pn))de ^B ( ' JlogN{Be,g, L2{Pn 
Jo Jo 



))de 



e/B 



< BVW 

< baVw 



Vlog(A/e)de 



Vlog e 



de. 



AB/9 



(24) 



Integration by parts gives 



Vlog e 



de 



Vloge 


~ 1 [ 


e 





1 



e^^loge 



-.de 



< 



vioi^ , 1 r vio^ 



c 2 

provided that c > e, by which we have 



-de. 



Vloge 2 Vlog c . 

— de < , it c > e. 



Since AB/9 > A> 3y/e > e, we have 



(24) < 2VW9Jlog{AB/9) 



imsart-generic ver. 2011/11/15 file: FunctlonQR-rev.tex date: February 23, 2012 



K. Kato/ Functional quantile regression 



40 



by which we have, using Holder's inequality, 



E[(24) I V] < VW 



\ 



E 



1(6' > 0)61 log 



V 



For any fixed c > 0, define f{u) = u\og{c/u) if u > and /(O) = 0. Then, 
f{u) is concave on [0, oo). Thus, by Jensen's inequality, the last expression is 
bounded by 

/ A^B^ 
V2WjE[supE4g^{u,,v,)] \ V] x log-- -—- 

V see E[supg£gE„[5-^(wj,u,)] I 

Using the decomposition 

g\u^,Vi) ^E[g\u„v{) \V] + {g\u,,v,) -E[g^ [u,,v,) | P]}, 
and the synimetrization inequality conditional on 2?, we have 



E 



supE„[5^(wj, Vj)] I V 
.see 



< supE„[E[52(,,^,„,) I p]] +21 
see 



sup|E„[crig^(Mi,Ui)]| I V 
.geQ 



< + 2E 



SUp|E„[cr,;g^(u,;,U,;)]| | V 



Using now the contraction principle [see 36, Proposition A. 3. 2], we have 

ip|E„[a,g2(u„w,)]| I {(w„w,)}r=i < 4SE 
SO that 



sup|E„[crjg(uj,Wi)]l I {(""^.^Oir^i 
see 



E 



supE„[g^(wi,Wi)] I ^ 
.see 



< + 8SE 



sup |E„[o-jg(ui, I V 
see 



Note that the right side is at most 95^. Since for any given c > 0, the map 
u I— >■ u log(c/M) is non-decreasing for < u < c/e, and A > 3-y/e, we have 

E\8upE^[g^{u„v,)] I X log-- ^p- 

< (f2 + log ■ 



< (f2 + log 



f 2 + 8J5Z 



2(f2 + 8i3Z)log 



v4B 



r 
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where 

Z ■.= E 

Taking these together, we have 



sup \En[cTig{u^,V^)]\ \ V 

see 



/ AB 
y/^Z < 2DVW\ (f 2 + 8BZ) log — . 



Solving this inequality with respect to Z gives the desired bound. 



□ 



Proposition 3. Consider the same setting as in Proposition 2. Instead of (i)- 
(iii) and (23), suppose that there is an envelope function G for Q such that for 
some constants A> e and W > 1, 

N{e\\G\\L,iP„),g,L2{Pn)) < (A/e)^, < Ve < 1. 
Then, we have for all q Cz [l,oo), 

1/9 



E 



sup \E„[a^g{ui,Vi)]\'' \ V 



< Dn~^/^(E4E[GHu,,v,) \ V]])^/" ,/W log A a.s., 



where q ~ q\/ 2 and D is a universal constant. 
Proof. By Dudley's inequality, 



E 



swp\^/nEn[crig{ui,Vi)]\'' \ 
g&Q 



< D / ^logN{e,g,L2{Pn))de, 
Jo 



where 6 := {snpg^gEn[g'^{ui,Vi)]y^'^ < {En[G'^{ui,Vi)]y/^ and £> is a universal 
constant. Using changes of variables implies that the right side is bounded by 

D{E,,[G^{u,,v,)])^/^Vw f Vlog{A/e)de. 



If 9 > 2, then by Holder's inequality, 

E[(E„[G2(y„z;,)])9/2 I V] < E[E4G''{u,,v,)] \ P] = E„[E[G«K, ^;,) | V]]. 
On the other hand, if q G [1; 2), 

E[{E„[G'{u,,v,)]y/^ I V] < (E„[E[G2K,«,) I 
This leads to the desired inequality. □ 
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